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Abstract 

Optimal sequence alignments depend heavily on alignment scoring parameters. 
Given input sequences, parametric alignment is the well-studied problem that asks 
for all possible optimal alignment summaries as parameters vary, as well as the op- 
timality region of alignment scoring parameters which yield each optimal alignment. 
But biologically correct alignments might be suboptimal for all parameter choices. 
Thus we extend parametric alignment to parametric k-best alignment, which asks 
for all possible A;-tuples of A;-best alignment summaries (si, S2, . . . , Sfc), as well as 
the k-best optimality region of scoring parameters which make si, S2, ■ ■ ■ , si^ the top 
k summaries. By exploiting the integer-structure of alignment summaries, we show 
that, astonishingly, the complexity of parametric /c-best alignment is only polyno- 
mial in k. Thus parametric fc-best alignment is tractable, and can be applied at the 
whole-genome scale like parametric alignment. 
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1 Introduction 



In pairwise sequence alignment, we are given a pair of homologous sequences ai, (T2, 
and each alignment A of (Ti,cj2 is endowed with an alignment summary s{A) G Z*^ 
that records various features of such as the number of mismatches and the number 
of spaces. Throughout we will assume the dimension d of alignment summaries is 
fixed, and that sequences are of length 0(n). The score of alignment A is defined 
to be c • s{A), where c is a fixed vector of alignment scoring parameters. A (global) 
optimal alignment is any alignment A which maximizes the alignment score. For 
most choices of alignment summary model, the optimal alignment summary can be 



computed in 0{n?) time by the Needleman-Wunsch (NW) algorithm 18|], once the 
value c of the alignment scoring parameters is given. 

The choice of c reflects relative frequencies of indels and different types of point 
mutations during sequence evolution. The optimal alignment is heavily dependent 
on the choice of c, and yet in practice the "biologically correct" choice of c is not 
known. Given sequences o"i, (T2, the space of alignment scoring parameters partitions 
into optimality regions. Parameter values in the same optimality region give rise 
to the same optimal alignment summary. Parametric alignment [191 ] is the problem 
of determining all possible optimal alignment summaries that arise as c varies, and 
also computing the optimality region for each optimal summary. 

Parametric alignment is a well-studied subject (see @, Q, [13, E H)^ and 



d(d-l) 



surprisingly tractable [1911 : Pachter and Sturmfels proved that there are 0{n ''+1 ) 



optimality regions. In [3, 8, oracle-based methods for computing optimality re- 
gions are presented, which repeatedly run the NW algorithm with different choices 

d{d-l) 

of scoring parameters c to find new optimal alignments. Despite the nice 0(n ''+1 ) 
bound on the number of optimality regions, it has been speculated 0] that the re- 

d'^(d-l) 

quired number of NW calls might be as high as Q{n 2{d+i) One purpose of this 
paper is to point out that, actually, 

d(d-l) 

Theorem 1. Existing oracle-based methods for parametric alignment only use 0{n ''+1 ) 
calls to the NW algorithm. 

Thus parametric alignment is much more tractable than previously thought. 
Nevertheless, one major shortcoming of parametric alignment is that it ignores 
nearly optimal alignments that are never optimal for any choice of scoring parame- 



ters. Nearly optimal alignments have been studied before (see |l7l.l2l[| and references 
within), but not in a parametric setting. Thus we propose parametric k-best align- 
ment, which studies how the /c-best alignment summaries vary with parameters. We 
consider two variants of the problem: 

• Ordered parametric A;-best alignment: Compute the collection of all ordered 
subsets of k distinct alignment summaries (si, . . . , Sk) which can become the 
fe-best summaries c - si > c - S2 > ■ ■ ■ > c - s/^ > . . . under some choice of c. For 
each such subset (si, . . . , Sk), find all c such that c ■ si > . . . > c • Sk > . . . are 
the fc-best summaries. 
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• Unordered parametric fc-best alignment: Same problem, but the ordering of 
the /c-best summaries {si, . . . ,Sk} is ignored. 

The output of parametric /c-best alignment is a decomposition of the space of 
alignment scoring parameters into k-best optimality regions. All scoring parame- 
ters in a /c-best optimality region yield the same list of /c-best distinct alignment 
summaries. 

Although parametric /c-best alignment is a natural extension of parametric align- 
ment, there are two major difficulties which have prevented its study: 

• The structure of /c-best optimality regions needs to be understood in order to 
systematically compute them, and 

• naively, we might worry that the number of /c-best optimality regions is expo- 
nential in k. 

We first address the second point. Notice that the total number of subsets of 
k alignment summaries grows exponentially in k. Indeed, if alignment summaries 
were arbitrary real-valued points, then the number of /c-best optimality regions 
could be exponential in k when k < d/2. But alignment summaries are integer 
points contained in a small volume, and using this fact we have a remarkable result: 

d(d-l) 

Theorem 2. For fixed k, the number of k-best optimality regions is 0{n ^+1 ), 
which matches the best known bound for the k = 1 case. Specifically, for general k, 

d(d-l) 

the number of k-best optimality regions is 0{{kn) <*+i ) for unordered parametric 

d(d-l) 

k-best alignment, and 0{{k n) ) for ordered parametric k-best alignment. 

Remark 1. Since there might be 0(n'^) alignment summaries, Theorem\^says that, 

remarkably, the number of k-best optimality regions is sublinear in the worst-case 

1 

number of alignment summaries if k = o{n'^-^). 

In order to leverage Theorem [2] and obtain fast parametric /c-best alignment, we 
need to find those very few /c-best optimality regions, without considering all possible 
subsets of k summaries. For standard parametric alignment (/c = 1), the collection of 
optimality regions can be efficiently represented and computed via an object called 
the alignment polytope. Polytopes are standard geometric objects which generalize 
polygons to higher dimensions. In polytope construction software was used to 
efficiently compute alignment polytopes and solve parametric alignment. 

For k > 1, it was not clear whether /c-best optimality regions could be represented 
by a polytope as in the k = 1 case. At the heart of our paper is the following 
affirmative result: 

Theorem 3. The collection of k-best optimality regions can be represented by a 
polytope called a k-set polytope. 

Specifically we define ordered k-set polytopes and unordered k-set polytopes, re- 
spectively, for ordered and unordered parametric /c-best alignment. For k = 1 the 
k-set polytopes are precisely the alignment polytope. Our k-set polytopes eludi- 
cate the structure of /c-best optimality regions, and allow us to generalize existing 
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polytope algorithms for parametric alignment to the /c-best setting. For standard 
parametric alignment, the oracle-based incremental polytope construction algorithm 
in 0] repeatedly runs the NW algorithm as a subroutine, with different choices of 
scoring parameters, in order to find new optimal alignment summaries. Here we 
generalize the oracle-based incremental polytope construction algorithm to solve 
parametric A;-best alignment. In our generalized incremental algorithm, the stan- 
dard NW algorithm is replaced with a fe-best version of NW that finds the fc-best 
alignment summaries, instead of just the optimal summary. (The running time 
of the fc-best NW algorithm is only a factor of k larger than the running time of 
standard NW). Our main result is: 

Theorem 4. There is an oracle-based incremental polytope construction algorithm 
to solve parametric k-best alignment. The algorithm solves unordered paramet- 
ric k-best alignment by calling the k-best version of the NW algorithm a total of 

d(d-l) 

0{{kn) ) times. For ordered parametric k-best alignment, the k-best NW al- 

d(d-l) 

gorithm is called 0{{k n) <^+i ) times. Besides NW calls, the rest of the algo- 

2d(d-l) 

rithm's running time is 0{{kn) ^+1 for unordered parametric k-best alignment, 

2d(d-l) 

and 0{{k n) ^+1 for ordered. 

Furthermore, for d < 3 and k = 0(n^/^) the total running time of our algorithm 
is optimal, i.e. the running time is the same as running the NW algorithm once for 
each /c-best optimality region. 

The most important feature of our algorithm's running time is that the depen- 
dence on k is polynomial instead of exponential. For small k the running time 
of parametric fc-best alignment is comparable to the best known bounds for the 
running time of standard parametric alignment. In Q] a whole-genome paramet- 
ric alignment of Drosophila is presented, demonstrating how practical parametric 
alignment can be in practice. Thus we are confident that parametric /c-best align- 
ment can be performed at the whole-genome scale as well, for not-too-large k. The 



incremental polytope construction software iB4e 15| can be used right out of the 
box to compute the necessary fe-set polytopes, once the fc-best version of the NW 
algorithm is written. 



2 Background on polyhedral geometry 

We begin by reviewing basic definitions and facts in polyhedral geometry. 

Definition 1. The convex hull of a set of points V = {vi, . . . ,Vn\ C is the 
set conviy) = {^CiVi \ = 1, q > Vi}. If^Ci = 1, and all Ci > 0, we say 

^CiVi is a convex combination ofV. 

Definition 2. A polytope is a convex hull of any finite non-empty V C M'^. 

The dimension of a polytope P C M"' is the dimension of its relative interior 
as a manifold. To avoid confusion between d and dimP, d is called the ambient 
dimension of P. 
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Definition 3. Given a polytope P C M"^ and a vector c S M*^, the face Fc C P is 

the set Fc = {x* S P\c ■ x* = max^^gp c ■ x}. By convention, the empty set is also 
considered to be a face of P 

Intuitively faces are the bounding extremities of the polytope. Any face of a 
polytope P is again a polytope, whose faces are also faces of P. For most choices of 
c, the face Fc will be a single point, which is called a vertex oi P. The 1-dimensional 
faces are called edges, and (dimP — l)-dimensional faces are called facets. 

Definition 4. Given a polytope P C M'^ and a face F C P, the normal cone N{F) 
is the set of all vectors c for which Fc ^ F. 

In other words, the normal cone N{F) is the set of all vectors c such that F 
weakly maximizes c • x over P. There is a natural duality between faces and normal 
cones: for any two faces F,G C P we have G C -F if and only if N{G) D N{F). The 
relative interiors of the normal cones of a polytope partition W^, and the collection 
of normal cones of all faces of P is called the normal fan of P. 

In this paper we will be interested in computing normal cones of vertices of a 
polytope P. By duality, it suffices to know the facets of P as we now explain. For 
simplicity assume dimP = d. The facets Fc of P which contain v give the set of 
vectors c which generate the normal cone N{v): 

N{v) = M>o{c \v G Fc, Fc is a facet of P} 

For further reading on polytopes, see [i^ . 



Computing convex hulls 

Polytopes have been extensively studied in computational geometry, and many al- 
gorithms for convex hull construction have been devised I 111 , a a. Unfortunately, 
traditional convex hull algorithms assume a point set S is explicitly given, for which 
conv(5) is to be computed. In sequence alignment we are presented with a quite 
different situation. We don't know the set S, but we seek to compute vertices and 
facets of conv(5), and we have a fast oracle (e.g. the NW algorithm) which will 
find a vertex of conv(5) that maximizes the dot-product with agiven vector c. The 
incremental construction algorithm and software reported in [J, [isj] builds convex 
hulls efficiently in this setting. Briefly put, the incremental construction algorithm 
repeatedly queries the vertex-finding oracle with different vectors c, adding one ver- 
tex at a time to the pol yto pe, until all vertices of the convex hull are guaranteed to 
be found. As shown in [la], we have 

Theorem 5. The incremental construction algorithm builds the convex hull of a 
point set S, and all faces of conv{S), given an oracle FindVertex(c) which maxi- 
mizes given c over S. The oracle is queried 0{V + F) times, where V and F are 
the number of vertices and facets of conv{S). Besides oracle calls, the running time 
is 0{ii + . . . + £n), where ij is the number of faces of the convex hull after the first 
j vertices are added. 
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3 Ordered and unordered k-set polytopes 



It is straightforward to use polytopes as a tool for standard parametric alignment, 
simply by computing vertices and facets of the alignment polytope, i.e. the convex 
hull of alignment summaries [l^. We now present special polytope constructions 
which are specifically designed for fc-best alignment. We begin with ordered para- 
metric /s-best alignment. 

We will consider an (implicitly defined, but not explicity listed) set of align- 
ment summaries S = {sj} C Z"', and wish to compute the A;-best summaries 
(si, . . . , Sfc) in S with respect to a linear scoring scheme c ■ si > c ■ S2 > ■ ■ ■■ In 
particular we will be interested in computing all possibilities for the fc-best sum- 
maries as c varies (where the ordering of the summaries is taken into account). We 
define polytopes P^, which we call ordered k-set polytopes, whose vertices correspond 
to obtainable tuples of fc-best summaries. 

Definition 5. Given a set of N alignment summaries S = {si} C Z'^, let (A^)fc 
denote the set of all N{N — 1) • • • {N — A; + 1) tuples of k distinct indices a = 
((t(1), . . . , cr{k)), where cr(l), . . . , cr{k) £ {1,2,..., N}. Notice that the ordering of 
the indices is taken into account. The ordered k-set polytope for S is the convex 
hull 

k 

Pk = conv{^{k + 1 - i)s^(i) |cr G {N)k}. 

i=l 



3Si + 2S2 + S3 
3Si + 2S2 + S4 
3Si + 2S3 + S2 

3si + 2s3 + S4 

3Sl + 2S4 + S2 

3si + 2s4 + S3 



3s2 + 2si + S3 
3s2 + 2si + S4 

3S2 + 2S3 + Si 
3S2 + 2S3 + S4 
3S2 + 2S4 + Si 
3S2 + 2S4 + S3 



3S3 + 2si + S2 

3s3 + 2si + S4 

3S3 + 2S2 + Si 
3S3 + 2S2 + S4 
3S3 + 2S4 + Si 
3S3 + 2S4 + S2 



3S4 + 2Si + S2 

3s4 + 2si + S3 

3S4 + 2S2 + Si 
3S4 + 2S2 + S3 
3S4 + 2S3 + Si 
3S4 + 2S3 + S2 



Table 1: Example of definition of ordered 3-set polytope P3, when 5 is a set of 
four points si, S2, S3, 54. In this case P3 is the convex hull of (4)3 = 4 • 3 ■ 2 = 24 
points. The 24 points are listed above. 



The following theorem shows that the normal fan of P^ gives precisely the ordered 
/c-best optimality regions for the alignment summaries S. Thus computing vertices 
and facets of P^ completely solves ordered parametric /c-best alignment. 

Theorem 6. The normal cone of a point X]^^i(A; + 1 — i)s„(^i^ € Pk is the set of all 
c satisfying c • s^(i) > c • s^(2) > • • • > c • s„(^k), and c ■ s^(k) >c-Sj for all j ^ a. 



Proof. See [IJ. □ 
We now give analagous results for unordered parametric fe-best alignment. 
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Definition 6. Given a set of N points S = {si} C R*^, let (^) denote the set of all 
(^) subsets of S of size k. The unordered k-set polytope Qk for S is 

Qk = conv{^ s\Ae (J^ }. 

Unordered A;-set polytopes have been previously studied 0, [l|. We can modify 
TheoremOto show that the normal fan of Qk gives the (unordered) /c-best optimality 
regions for alignment summaries: 

Theorem 7. The normal cone of a point X^^g^ s G Qk is the set of vectors c which 
satisfy c - s > c ■ s' for all s A and s' ^ A. 

Example 1. Suppose S is the set of four vertices of a square. Figure{l\ shows some 
unordered k-set polytopes Qk and ordered k-set polytopes Pk for S. 




m fed) 

Pi = Qi Qa P2 

Figure 1: Examples of unordered fc-set polytopes Qk and ordered A;-set poly- 
topes Pk, for S = {a,b,c,d} = vertices of a square. Points are labeled by the 
ordered/unordered k-set they represent. Notice for example the point 2b + dis 
in the interior of P2; this means that it is impossible for a linear scoring scheme 
to make (6, d) the ordered top-2 points. 



In parametric /c-best alignment, the set of points S are alignment summaries, 
which are integer points. In this case we can obtain remarkable bounds on the 
complexity of k-set polytopes. 

Theorem 8. Suppose S <Z is a set of N integer points, and k < N. Let V 
be the volume of conv{S), and assume V > 0. If V is any subset of vertices of 
the unordered k-set polytope Qk for S, then the total number of faces of conviV) is 
0((A;^V)('^~^^/^'^^^^). Similarly, if W is any subset of the vertices of ordered k-set 
polytope forS, then the total number of faces of the conviW) is 0((A;^'^V)('^~^^/^'^'^^^). 

Proof. See Appendix. □ 

This concludes our treatment of k-set polytopes. The rest of the paper gives 
applications to parametric fc-best alignment, along with details on computation and 
implementation. 
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4 Parametric /c-best alignment 



We now aggregate the results of the previous sections to efficiently solve parametric 
k-hest alignment. For given sequences cJi, a"2 of length 0{n), let S = {si, . . . ,sn} 
be the set of all alignment summaries. Each entry in an alignment summary counts 
a feature in the alignment, such as the number of occurences of a type of mismatch. 
Thus conv(5) has volume 0{n'^). 

We first explain how to solve ordered parametric /c-best alignment. For k < N 
let Pk be the ordered k-set polytope for S. Given values for alignment scoring 
parameters c, the NW algorithm finds the optimal alignment summary by dynamic 
programming, keeping track of the optimal summary at each node in the alignment 
graph [i^]. Similarly, the NW algorithm can compute the top k distinct alignment 
summaries using the same type of dynamic programming recursion, keeping track 
of the top k distinct summaries at each node in the alignment graph. Thus we can 
define an oracle which, given c, will find the vertex s* £ Pk which maximizes c ■ s 
over i-fci 

• Call the NW algorithm with scoring parameters c, to compute the top k dis- 
tinct alignment summaries si, . . . ,Sk such that c • si > ... > c - Sk > .... 

• Return s* := ksi + {k — l)s2 + ■ ■ ■ + Sk- 

Then Theorem [5] says that the incremental polytope construction method will 
compute Pfc and its normal fan, calling the above oracle Oiy + F) times where V 
and F are the number of vertices and facets of Pk- Since conv(5) has volume 0{n'^), 

d(d-l) 

Theorem [8] says that V and F are both 0{{k n) rf+i ). Theorems [5] and [8] also tell 
us that besides oracle calls, the incremental polytope construction method takes no 

d(d-l) 2d(d-l) 

more than 0{y{k n) d+i ) time, which is 0{{k n) d+i Putting it all together, 
we have 

Theorem 9. Given sequences of length 0{n), let V,F he the number of vertices 
and facets of the ordered k-set polytope for alignment summaries. The incremen- 
tal polytope construction method will solve ordered parametric k-hest alignment in 

■2d(d~l) 

0{(y + F)W {n,k) + {k n) )) time, where W{n,k) is the time required to run 

k-hest NW once on sequences of length 0{n). 

Typically W{n,k) = @{kn'^). Table [2] gives complexity bounds in this case for 
specific small values of d. 

Remark 2. The upper bound theorem for polytopes f2^] says that the number of 
faces of a d-dimensional polytope is linear in the number of vertices if d < 3. So if 
d < 3, and W{n, k) = Q{kn'^), the running time of our algorithm is 0{Vkn'^ + V"^), 
and since V = 0{k^n^/'^), the running time is thus 0(y-iy(n, k)) when k = 0(n^/^). 
This is the same running time as the time required to run the NW algorithm once 
for each top k ranking. Thus for d < 3 and k = 0(n^/^), our algorithm is an 
optimal oracle-based method. 

Remark 3. When d = 2, at most one facet (edge) of the ordered k-set polytope is 
deleted when a new vertex is added ( otherwise at least one vertex v would also he 
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deleted, contradicting that v is a vertex). Thus for d = 2 and W{n, k) = Q{kn'^) the 
running time is always the optimal 0{y ■ k)) = 0{k'^ ^^n^/^) for any k. 

Remark 4. In practice, we have observed that relatively few faces are created or 
destroyed when each new vertex of is found. If an amortized 0(1) faces are 
created or destroyed when each new vertex of Pk is found, then the incremental 
polytope construction method solves ordered parametric k-best alignment in 0{{V + 

d{d-l) 

F)-W{n,k)) time, which is 0{{k n) ■W{n,k)). It is an important open question 
to determine the worst-case number of faces that can be created or destroyed during 
the incremental construction of P^. 

We now explain how to solve unordered parametric /c-best alignment. The so- 
lution is analagous to ordered /c-best alignment. Let be the unordered k-set 
polytope for S. We define an oracle which, given c, will find the vertex s* G Qfc 
which maximizes c • s over Q^: 

• Call the NW algorithm with alignment scoring parameters c, to compute the 
top k distinct alignment summaries si, . . . ,Sk such that c-si > ... > c-s^ > .... 

• Return s* := si + S2 + • • ■ + s^- 

Then, endowed with the above vertex-finding oracle, we have 

Theorem 10. For sequences of length 0{n), let V, F be the number of vertices and 
facets of the unordered k-set polytope for alignment summaries. The incremental 
polytope construction method will solve unordered parametric k-best alignment in 

2d(d-l) 

0{{V + F)W{n, k) + {kn)^i+^)) time. 

Table [2] gives specific bounds for small values of d, assuming W{n, k) = @{kn'^). 



d 


Output size 


Running time 


Output size 


Running time 




(unordered) 


(unordered) 


(ordered) 


(ordered) 


2 


0(^/3^2/3) 


0(^5/3^8/3) 


0(^4/3^2/3) 


0(^7/3^8/3) 


3 


0(^/2^3/2) 


0(^5/2^7/2 ^ p^3) 


0(fc3n3/2) 


0(fcV/2 + A;S3) 


4 


0(^12/5^12/5) 


0(^24/5^24/5) 


0(^24/5^12/5) 


0(^48/5^24/5) 



Table 2: Running time and output complexity of ordered/unordered paramet- 
ric A;-best alignment for small dimensions, assuming the /c-best version of the 
NW algorithm runs in @{kn'^) time. 



Remark 5. Analagous to Remark^ the running time would be 0{{V+F)-W{n, k)) 
if we could prove that an amortized 0(1) faces are created or destroyed when each 
new vertex of Qk is found. 

Remark 6. Analagous to Remark if d < 3 the running time is the optimal 
0{V ■ W{n, k)) when k = 0{n) and W{n, k) = Q{kn^). 
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The software iB4e reported in [15|] can be used right out of the box to solve 
ordered and unordered parametric A:-best ahgnment this way, once the oracles for 
finding vertices of Pk and Qk are written. A beta version of iB4e was used in 0] 
to perform high throughput parametric alig nment, and greatly outperformed the 



"polytope semiring" method reported in 19(]. 



5 Discussion 

Although parametric alignment is a major improvement upon standard sequence 
alignment, parametric alignment ignores nearly optimal alignments. Here we have 
extended parametric alignment to the A:-best setting, determining how the top k 
alignment summaries vary with scoring parameters. This allows for much more 
realistic parametric analysis of biological sequences. 

Parametric alignment has enjoyed remarkably good complexity results, enabling 
whole-genome parametric analysis of Drosophila genomes Q]- By extending the 
good complexity results to parametric /c-best alignment, we believe parametric k- 
best alignment can be performed at the whole-genome scale as well. As in 0], such 
genome-scale parametric analysis will require standard preprocessing techniques 
that break up pairwise genomes into smaller reliably homologous subsequences. 

In some applications, estimates of scoring parameters might be known along 
with confidence intervals on the estimates. In this case we can restrict attention to 
optimality regions which intersect the confidence region for parameters. It is possible 
to augment the vertex-finding oracle in iB4e so that only optimal alignments whose 
optimality regions intersect a prescribed cone C are found; other optimal alignments 
are completely avoided. Details can be found in [l^. For example optimality 
regions could be restricted to a cone over a bounding box, as in [ij]. Restricting 
the parameter space has the additional benefit of speeding up parametric fc-best 
alignment, by reducing the number of optimality regions. 

The dimension d of alignment summaries is the most prohibitive factor in the 
complexity of both parametric alignment and fc-best alignment. But the curse of 
dimension is not nearly as bad as was speculated in [9|. While some polytopes 
with V vertices might have 0(yl-<^/^J) faces, we have shown that k-set polytopes are 
special, and that the remarkable bounds on their number of vertices also applies to 
faces of all dimensions. Thus parametric alignment and /c-best alignment are much 
more tractable than previously thought. This agrees with empirical observations, 
e.g. in 0] parametric alignment was demonstrated to be computationally practical 
for d < 5 at the whole genome level Our complexity results indicate that parametric 
/c-best alignment will be similarly tractable. Based on compututational experience 
with parametric alignment, we believe parametric /c-best alignment will even be 
tractable for d = 6,7 when sequences are short. 

It is important to note that restricting alignment summaries to have dimension 
< 7 prohibits the most general models of alignment scoring parameters. For protein 
sequences, all but the most basic scoring matrices will yield d > 7. Thus parametric 
alignment is not well-suited for protein sequence analysis. Fortunately, for DNA 
sequences, popular scoring models such as those based on Jukes-Cantor, Kimura-2, 
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and Kimura-3 scoring matrices will result in < 6. 

Parametric alignment belongs to a more general class of algorithms called para- 
metric inference algorithms for graph-based models [l^. We note that the frame- 
work we have laid out here, extending parametric alignment to the /c-best setting, 
can be adapted to perform parametric A;-best inference in other graph-based models 
as well. The remarkable complexity results we have proved can be extended to the 
parametric k-hest inference setting as well. Similarly the software iB4e can be used 
to perform efficient parametric fc-best inference, provided an oracle which performs 
A:-best inference given scoring parameters. Two important graph-based models in 
biology which can benefit from parametric /c-best inference are hidden Markov mod- 
els over discrete state spaces, and tree-models for single nucleotide evolution. The 
vertex-finding oracles provided to iB4e for these graph-based models would be the 
fc-best Viterbi algorithm and fc-best Felsenstein pruning algorithm respectively. 
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A Appendix 



Ordered and unordered k-set polytopes have rich structure which has still not been 



fully explored. Here we recall a result from [ij] which shows that k-set polytopes 
can be intractable for general sets of points: 

Proposition 1. Given a set of N points S C M.'^, let be the number of k- 
dimensional faces of conv{S). Then the ordered k-set polytope for V has at least 
{k + vertices. Thus k-set polytopes have Q{N'^) vertices in the worst case, if 
k < d/2. 

However, alignment summaries are integer points. Suppose 5 C Z'^ is a finite 
set of integer points. Then as recalled in [3] , we have: 

Theorem 11 ( Andrews-Barany) . LetY be the volume of conv{S). IfY>0, then 
the number of k- dimensional faces of conv{S) is 0{Y^'^~^^^^'^^^^) for every k. 

No such result is possible when S is an arbitrary set of real-valued points. 

A.l Proof of Theorem [8] 



The proof requires a lemma, proved in [14l |: 



Lemma 1. Suppose S C M"^ is a set of N > 1 points. If 1 < k < N, then the k-set 
polytopes Qk and for S have the same dimension as conv{S) . 

Proof of Theorem\^ By definition of Minkowski sum, every point in the k-set poly- 
tope Qk is also a point in the A;-fold Minkowsi sum conv(S')®'^, which equals the 
fe-fold dilation k ■ conv(5) = {kx \ x G conv(S')}. Thus the volume of Qk is no more 
than the volume of k ■ conv(5'), which is k'^Y . By the lemma, the volume of Qk is 
positive, so we can apply Theorem 1 1 1 1 and obtain that the number of faces of Qk is 

Now, for the ordered k-set polytope Pk, we recall that Pk = QiQ ■ ■ - QQk- Since 
each Qj is a subset j ■ conv(S'), we have that Pk C k'^ ■ conv(S'). Now Pk is a lattice 
polytope of positive volume < k'^'^Y. So Theorem [8] tells us that the number of faces 

of Pk is 0((A;2'^V)('^-^)/('='+i)). □ 
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